Fix gunicorn "Control server error" on kubernetes#7591
Fix gunicorn "Control server error" on kubernetes#7591
Conversation
c41bf04 to
d79792c
Compare
| # On k8s, the default location may persist across restarts and cause permission errors | ||
| # See: <https://github.com/pulp/pulpcore/issues/7574> |
There was a problem hiding this comment.
What was the default location?
This file belongs in /run/ somewhere.
This directory contains system information data describing the system since it was booted. Files under this directory must be cleared (removed or truncated as appropriate) at the beginning of the boot process.
[...]
System programs that maintain transient UNIX-domain sockets must place them in this [/run] directory or an appropriate subdirectory as outlined above.
https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch03s15.html
There was a problem hiding this comment.
It defaults to the current directory: https://gunicorn.org/guides/gunicornc/#start-gunicorn-with-control-socket
I wonder why.
There was a problem hiding this comment.
Hmmm, the code apparently says something else:
https://github.com/benoitc/gunicorn/blob/9aa54703f4950818aed538dbee9578e868375cc9/gunicorn/config.py#L3138-L3146
There was a problem hiding this comment.
Ok, it was changed in 25.2: benoitc/gunicorn@0ad47db
I guess we should not do anything, then.
This change was to improve on gunicorn's default (of version 25.1), but they have improved by themselves.
There was a problem hiding this comment.
Or do you think it's still worth it, to account for the case 25.1 is installed?
There was a problem hiding this comment.
So what I understand is that gunicorn tries XDG_RUNTIME_DIR first and falls back to HOME.
I would claim that the variable XDG_RUNTIME_DIR should have been set. Not sure if the os in the container or the container runtime is to blame, but the default gunicorn behaviour seems sound to me and your change makes that unnecessarily rigid.
We should probably propagate the option instead so it stays possible to overwrite it.
There was a problem hiding this comment.
We should probably propagate the option instead so it stays possible to overwrite it.
That sounds good.
There was a problem hiding this comment.
So what I understand is that gunicorn tries XDG_RUNTIME_DIR first and falls back to HOME.
The first release of the feature the default was the current directory (whathever that was...). In the following Y they've changed to this, which I agree is sane.
d79792c to
bd14a9a
Compare
There was a problem hiding this comment.
My local flake8 keeps complaining about those
Exposes gunicorn's control socket path as a CLI option so operators can redirect it away from locations that cause permission errors on certain deployments (e.g. shared PVCs on k8s rolling updates). Silently ignored on gunicorn<25.1, which introduced the feature. fixes: pulp#7574 Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com>
bd14a9a to
2f7e3ce
Compare
gunicorn 25.1.0 introduced a control socket (gunicornc) that defaults to
gunicorn.ctlrelative to the working directory. Since pulpcore-content sets its CWD to WORKING_DIRECTORY (/var/lib/pulp/tmpby default), the socket lands on the shared PVC and persists across pod restarts, causing Permission denied when a new pod tries to recreate it during a rolling update.Default to /tmp/pulpcore-content.ctl, which is pod-local ephemeral storage. Users who want a different path can override via gunicorn.conf.py.
fixes: #7574
Assisted-by: Claude Code
📜 Checklist
See: Pull Request Walkthrough